113 research outputs found
Deep representation learning for human motion prediction and classification
Generative models of 3D human motion are often restricted to a small number
of activities and can therefore not generalize well to novel movements or
applications. In this work we propose a deep learning framework for human
motion capture data that learns a generic representation from a large corpus of
motion capture data and generalizes well to new, unseen, motions. Using an
encoding-decoding network that learns to predict future 3D poses from the most
recent past, we extract a feature representation of human motion. Most work on
deep learning for sequence prediction focuses on video and speech. Since
skeletal data has a different structure, we present and evaluate different
network architectures that make different assumptions about time dependencies
and limb correlations. To quantify the learned features, we use the output of
different layers for action classification and visualize the receptive fields
of the network units. Our method outperforms the recent state of the art in
skeletal motion prediction even though these use action specific training data.
Our results show that deep feedforward networks, trained from a generic mocap
database, can successfully be used for feature extraction from human motion
data and that this representation can be used as a foundation for
classification and prediction.Comment: This paper is published at the IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), 201
Simultaneous Measurement Imputation and Outcome Prediction for Achilles Tendon Rupture Rehabilitation
Achilles Tendon Rupture (ATR) is one of the typical soft tissue injuries.
Rehabilitation after such a musculoskeletal injury remains a prolonged process
with a very variable outcome. Accurately predicting rehabilitation outcome is
crucial for treatment decision support. However, it is challenging to train an
automatic method for predicting the ATR rehabilitation outcome from treatment
data, due to a massive amount of missing entries in the data recorded from ATR
patients, as well as complex nonlinear relations between measurements and
outcomes. In this work, we design an end-to-end probabilistic framework to
impute missing data entries and predict rehabilitation outcomes simultaneously.
We evaluate our model on a real-life ATR clinical cohort, comparing with
various baselines. The proposed method demonstrates its clear superiority over
traditional methods which typically perform imputation and prediction in two
separate stages
Learn the Time to Learn: Replay Scheduling in Continual Learning
Replay methods have shown to be successful in mitigating catastrophic
forgetting in continual learning scenarios despite having limited access to
historical data. However, storing historical data is cheap in many real-world
applications, yet replaying all historical data would be prohibited due to
processing time constraints. In such settings, we propose learning the time to
learn for a continual learning system, in which we learn replay schedules over
which tasks to replay at different time steps. To demonstrate the importance of
learning the time to learn, we first use Monte Carlo tree search to find the
proper replay schedule and show that it can outperform fixed scheduling
policies in terms of continual learning performance. Moreover, to improve the
scheduling efficiency itself, we propose to use reinforcement learning to learn
the replay scheduling policies that can generalize to new continual learning
scenarios without added computational cost. In our experiments, we show the
advantages of learning the time to learn, which brings current continual
learning research closer to real-world needs
Full-Glow: Fully conditional Glow for more realistic image generation
Autonomous agents, such as driverless cars, require large amounts of labeled
visual data for their training. A viable approach for acquiring such data is
training a generative model with collected real data, and then augmenting the
collected real dataset with synthetic images from the model, generated with
control of the scene layout and ground truth labeling. In this paper we propose
Full-Glow, a fully conditional Glow-based architecture for generating plausible
and realistic images of novel street scenes given a semantic segmentation map
indicating the scene layout. Benchmark comparisons show our model to outperform
recent works in terms of the semantic segmentation performance of a pretrained
PSPNet. This indicates that images from our model are, to a higher degree than
from other models, similar to real images of the same kinds of scenes and
objects, making them suitable as training data for a visual semantic
segmentation or object recognition system.Comment: 17 pages, 12 figure
- …